15 research outputs found
Very Large-Scale Singular Value Decomposition Using Tensor Train Networks
We propose new algorithms for singular value decomposition (SVD) of very
large-scale matrices based on a low-rank tensor approximation technique called
the tensor train (TT) format. The proposed algorithms can compute several
dominant singular values and corresponding singular vectors for large-scale
structured matrices given in a TT format. The computational complexity of the
proposed methods scales logarithmically with the matrix size under the
assumption that both the matrix and the singular vectors admit low-rank TT
decompositions. The proposed methods, which are called the alternating least
squares for SVD (ALS-SVD) and modified alternating least squares for SVD
(MALS-SVD), compute the left and right singular vectors approximately through
block TT decompositions. The very large-scale optimization problem is reduced
to sequential small-scale optimization problems, and each core tensor of the
block TT decompositions can be updated by applying any standard optimization
methods. The optimal ranks of the block TT decompositions are determined
adaptively during iteration process, so that we can achieve high approximation
accuracy. Extensive numerical simulations are conducted for several types of
TT-structured matrices such as Hilbert matrix, Toeplitz matrix, random matrix
with prescribed singular values, and tridiagonal matrix. The simulation results
demonstrate the effectiveness of the proposed methods compared with standard
SVD algorithms and TT-based algorithms developed for symmetric eigenvalue
decomposition
Cluster Analysis of Medicinal Plants and Targets Based on Multipartite Network
Network-based methods for the analysis of drug-target interactions have gained attention and rely on the paradigm that a single drug can act on multiple targets rather than a single target. In this study, we have presented a novel approach to analyze the interactions between the chemicals in the medicinal plants and multiple targets based on the complex multipartite network of the medicinal plants, multi-chemicals, and multiple targets. The multipartite network was constructed via the conjunction of two relationships: chemicals in plants and the biological actions of those chemicals on the targets. In doing so, we introduced an index of the efficacy of chemicals in a plant on a protein target of interest, called target potency score (TPS). We showed that the analysis can identify specific chemical profiles from each group of plants, which can then be employed for discovering new alternative therapeutic agents. Furthermore, specific clusters of plants and chemicals acting on specific targets were retrieved using TPS that suggested potential drug candidates with high probability of clinical success. We expect that this approach may open a way to predict the biological functions of multi-chemicals and multi-plants on the targets of interest and enable repositioning of the plants and chemicals
Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications
Many pattern classification algorithms such as Support Vector Machines (SVMs), Multi-Layer Perceptrons (MLPs), and K-Nearest Neighbors (KNNs) require data to consist of purely numerical variables. However many real world data consist of both categorical and numerical variables. In this paper we suggest an effective method of converting the mixed data of categorical and numerical variables into data of purely numerical variables for binary classifications. Since the suggested method is based on the theory of learning Bayesian Network Classifiers (BNCs), it is computationally efficient and robust to noises and data losses. Also the suggested method is expected to extract sufficient information for estimating a minimum-error-rate (MER) classifier. Simulations on artificial data sets and real world data sets are conducted to demonstrate the competitiveness of the suggested method when the number of values in each categorical variable is large and BNCs accurately model the data.
Directional dependence between major cities in China based on copula regression on air pollution measurements.
Air pollution is well-known as a major risk to public health, causing various diseases including pulmonary and cardiovascular diseases. As social concern increases, the amount of air pollution data is increasing rapidly. The purpose of this study is to statistically characterize dependence between major cities in China based on a measure of directional dependence estimated from PM2.5 measurements. As a measure of the directional dependence, we propose the so-called copula directional dependence (CDD) using beta regression models. An advantage of the CDD is that it does not rely on strict assumptions of specific probability distributions or linearity. We used hourly PM2.5 measurement data collected at four major cities in China: Beijing, Chengdu, Guangzhou, and Shanghai, from 2013 to 2017. After accounting for autocorrelation in the PM2.5 time series via nonlinear autoregressive models, CDDs between the four cities were estimated to produce directed network structures of statistical dependence. In addition, a statistical method was proposed to test the directionality of dependence between each pair of cities. From the PM2.5 data, we could discover that Chengdu and Guangzhou are the most closely related cities and that the directionality between them has changed once during 2013 to 2017, which implies a major economic or environmental change in these Chinese regions